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Abstract. A distributed algorithm performs local computations on pieces 
of input and communicates the results through given communication 
links. When processing a massive graph in a distributed algorithm, lo¬ 
cal outputs must be configured as a solution to a graph problem with¬ 
out shared memory and with few rounds of communication. In this pa¬ 
per we consider the problem of computing a local cluster in a massive 
graph in the distributed setting. Computing local clusters are of certain 
application-specific interests, such as detecting communities in social net¬ 
works or groups of interacting proteins in biological networks. When the 
graph models the computer network itself, detecting local clusters can 
help to prevent communication bottlenecks. We give a distributed algo¬ 
rithm that computes a local cluster in time that depends only logarith¬ 
mically on the size of the graph in the CONGEST model. In particular, 
when the conductance of the optimal local cluster is known, the algorithm 
runs in time entirely independent of the size of the graph and depends 
only on error bounds for approximation. We also show that the local 
cluster problem can be computed in the fc-machine distributed model in 
sublinear time. The speedup of our local cluster algorithms is mainly due 
to the use of our distributed algorithm for heat kernel pagerank. 

Keywords: Distributed algorithms, local cluster, sparse cut, heat kernel 
pagerank, heat kernel, random walk 


1 Introduction 

Distributed computation is an increasingly important framework as the demand 
for fast data analysis grows and data simultaneously becomes too large to fit in 
main memory. As distributed systems for large-scale graph processing such as 
Pregel [2D], GraphLab [ID], and Google’s MapReduce [11] are rapidly developing, 
there is a need for both theoretical and practical bounds in adapting classical 
graph algorithms to a modern distributed and parallel setting. 

* An extended abstract appeared in Proceedings of WAW (2015). Research supported 
in part by Office of Naval Research (ONR) N00014-13-1-0257. 
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A distributed algorithm performs local computations on pieces of input and 
communicates the results through given communication links. When processing a 
massive graph in a distributed algorithm, local outputs must be configured with¬ 
out shared memory and with few rounds of communication. A central problem 
of interest is to compute local clusters in large graphs in a distributed setting. 

Computing local clusters are of certain application-specific interests, such as 
detecting communities in social networks m or groups of interacting proteins 
in biological networks 11 6 1 . When the graph models the computer network itself, 
detecting local clusters can help identify communication bottlenecks, where one 
set of wcll-connected nodes is separated from another by a small number of 
links. Further, being able to identify the clusters quickly prevents bottlenecks 
from developing as the network grows. 

A local clustering algorithm computes a set of vertices in a graph with a small 
Cheeger ratio (or so-called conductance as defined in Section lT2l) . Moreover, 
we ask that the algorithm use only local information. In the static setting, an 
important consequence of this locality constraint is running times proportional to 
the size of the output set, rather than the entire graph. In this paper, we present 
the first algorithms for computing local clusters in two distributed settings that 
finish in a sublinear number of rounds of communication. 

A standard technique in local clustering algorithms is the so-called sweep 
algorithm. In a sweep, one orders the vertices of a graph according to some 
real-valued function defined on the vertex set and then investigates the cut set 
induced by each prefix of vertices in the ordering. The classical method of spec¬ 
tral clustering uses eigenvectors as functions for the sweep. For local clustering 
algorithms, the sweep functions are based on random walks mm- in dp, 
the efficiency of the local clustering algorithm is due to the use of PageRank 
vectors as the sweep functions [2]. In this paper, the main leverage in the im¬ 
proved running times of our algorithms is to use the heat kernel pagerank vector 
for performing a sweep. In particular, we are able to exploit parallelism in our 
algorithm for computing the heat kernel pagerank and give a distributed ran¬ 
dom walk-based procedure which requires fewer rounds of communication and 
yet maintains similar approximation guarantees as previous algorithms. 

In Section ED we will describe two distributive models - the CONGEST 
model and the fc-machine model. We demonstrate in two different distributed 
settings that a heat kernel pagerank distribution can be used to compute local 
clusters with Cheeger ratio 0{y/lj)) when the optimal local cluster has Cheeger ra¬ 
tio (f>. With a fast, parallel algorithm for approximating the heat kernel pagerank 
and efficient local computations, our algorithm works on an n-vertex graph in the 
CONGEST, or standard message passing, model with high probability in at most 
O ^ + 7 log nj rounds of communication where e is an error bound for 

approximation. This is an improvement over the previously best-performing local 
clustering algorithm in [5] which uses a personalized PageRank vector and fin¬ 
ishes in O log 2 n + nlog n) rounds in the CONGEST model for any 0 < a < 1. 
We then extend our results to the /e-machine model to show that a local cluster 
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1.1 Related Work 


The idea of computing local clusters with random walks was introduced by 
Lovasz and Simonovits in their works analyzing the isoperimetric properties of 
random walks on graphs msm- Spielman and Teng [24] expanded upon these 
ideas and gave the first nearly-linear time algorithm for local clustering, improv¬ 
ing the original framework by sparsifying the graph. The algorithm of [241 finds 
a local cluster with Cheeger ratio 0(^/0 log 3 / 2 n) in time 0(m(\og 
where m is the number of edges in the graph. Each of these algorithms uses the 
distribution of random walks of length O(^). Andersen et al. Tj give a local 
clustering algorithm using the distribution given by a PageRank vector. Their 
algorithm promises a Olog 1 ^ 2 n) cluster approximation and runs in time 
0(^r log 4 m). Orecchia et al. use a variant of heat kernel random walks in their 
randomized algorithm for computing a cut in a graph with prescribed balance 
constraints m- A key subroutine in the algorithm is a procedure for computing 
e~ A v for a positive semidefinite matrix A and a unit vector v in time 0(m ) 
for graphs on n vertices and m edges. Indeed, heat kernel has proven to be an 
efficient and effective tool for local cluster detection mm- 

Andersen and Peres |2j simulate a volume-biased evolving set process to find 
sparse cuts. Their algorithm improves the ratio between the running time of the 
algorithm on a given run and the volume of the output set while maintaining 
similar approximation guarantees as previous algorithms. Their algorithm is later 
improved in [12] . Arora, Rao, and Vazirani [3] give a 0(y /log n)-approximation 
algorithm using semi-definite programming techniques, however it is slower than 
algorithms based on spectral methods and random walks. 

For distributed algorithms, in m fast random walk-based distributed algo¬ 
rithms are given for estimating mixing time, conductance and the spectral gap 
of a network. In |3], distributed algorithms are derived for computing PageRank 
vectors with 0(4; log n) rounds for any 0 < a < 1 with high probability. Das 
Sarma et al. j<3] give two algorithms for computing sparse cuts in the CONGEST 
distributed model. The first algorithm uses random walks and is based on the 
analysis of [23] ■ By incorporating the results of m, they show that the station¬ 
ary distribution of a random walk of length l can be computed in 0(1) rounds. 
The second algorithm in [8] uses PageRank vectors and is based on the analysis 
of pp. By using the results of [9], the authors of 0 compute local clusters in 
0 ((^ + n) logn) rounds with standard random walks and 0(~ log 2 n + nlogn) 
rounds using PageRank vectors. 
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2 The Setting and Our Contributions 

2.1 Models of Computation 

We consider two models of distributed computation the CONGEST model 
and the fc-machine model. In each, data is distributed across nodes (machines) 
of a network which may communicate over specified communication links in 
rounds. Memory is decentralized, and the goal is to minimize the running time 
by minimizing the number of rounds required for computation for an arbitrary 
input graph G. We emphasize that local communication is taken to be free. 


The CONGEST model The first model we consider is the CONGEST model. 
In this model, the communication links are exactly the edges of the input graph 
and each vertex is mapped to a dedicated machine. The CONGEST (or stan¬ 
dard message-passing) model was introduced in (22123) to simulate real-world 
bandwidth restrictions across a network. 

Due to how the vertices are distributed in the network, we simplify the model 
by assuming the computer network is the input graph G = (V, E) on n = |V| 
nodes or machines and m = \E\ edges or communication links. Each node has a 
unique logn-bit ID. Initially each node only possesses its own ID and the IDs of 
each of its neighbors, and in some instances we may allow nodes some metadata 
about the graph (the value of n, for instance). Nodes can only communicate 
through edges of the network and communication occurs in rounds. That is, any 
message sent at the beginning of round r is fully transmitted and received by the 
end of round r. We assume that all nodes run with the same processing speed. 
Most importantly, we only allow O(logn) bits to be transmitted across any edge 
per round. 


The fc-machine model The defining difference between the fc-machine model 
and the CONGEST model is that, whereas vertices are mapped to distinct, ded¬ 
icated machines in the CONGEST model, a number of vertices may be mapped 
to the same machine in the fc-machine model. This model is meant to more ac¬ 
curately simulate distributed graph computation in systems such as Pregel f20l 
and GraphLab (I9j . 

We consider computing over massive datasets distributed over nodes of the 
fc-machine network. The complete data is never known by any individual ma¬ 
chine, and there is no shared memory. Each machine executes an instance of a 
distributed algorithm, and the output of each machine is with respect to the data 
points it hosts. A solution to a full problem is then a particular configuration of 
the outputs of each of the machines. The model is discussed in greater detail in 
Section [5] 

The two models are limiting and advantageous in different ways, and one 
is not inherently better than the other. For instance, since many vertices are 
mapped to a single machine in the fc-machine model, there is more “local in¬ 
formation” available since vertices sharing a machine can communicate for free. 
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However, since communication is restricted to the communication links in the 
computer network, vertex-vertex communication is somewhat less restrictive in 
the CONGEST model since links exactly correspond to edges. The consequences 
of these differences are largely observed in time complexity, and certain graph 
problems are more suited to one model than the other. 

In this paper we analyze our algorithmic techniques in the CONGEST model, 
and then use the Conversion Theorem of |T5] to give an efficient probabilistic 
algorithm in the fc-machine model for computing local clusters. 

2.2 Local Clusters and Heat Kernel Pagerank 

Throughout this paper, we consider a graph G = (V, E) with n = |P| and 
m = \E\ that is connected and undirected. In this section we give some definitions 
that will make our problem statement and results precise. 

Personalized heat kernel pagerank The heat kernel pagerank is so named 
for the heat kernel of the graph, Jit = e~ , where L is the normalized graph 
Laplacian L = D~ 1 ^ 2 (D — A)D~ 1 ^ 2 . Here D is the diagonal matrix whose en¬ 
tries correspond to vertex degree and A is the symmetric adjacency matrix. The 
heat kernel is a solution to the heat equation ^ = — Lzt, and thus has funda¬ 
mental connections to diffusion properties of a graph. Because of its connection 
to random walks, for heat kernel pagerank we use a similar heat kernel matrix, 
H t = e _tL , where L = I —P. Here, I is the nxn identity matrix and P = D~ l A 
is the transition probability matrix corresponding to the following standard ran¬ 
dom walk on the graph: at each step, move from a vertex v to a random neighbor 
u. Then the heat kernel pagerank is defined in terms of a preference (row) vec¬ 
tor / as ptj = fH t . When /, as a row vector, is some probability distribution 
over the vertices, the following formulation is useful for our Monte Carlo-based 
approximation algorithm: 



( 1 ) 


In this paper, we consider preference vectors f = Xs with all probability on a 
single vertex s, called the seed , and zero probability elsewhere. This is a common 
starting distribution for the PageRank vector, as well, commonly referred to as a 
personalized PageRank (or PPR) vector. We will adapt similar terminology and 
refer to the vector p ttS '■= pt,x s as the personalized heat kernel pagerank vector 
for s, or simply PHKPR. 

Cheeger ratio For a non-empty subset S C V of vertices in a graph, define 
the volume to be vol(S') = ^2 ve gd v , where d v is the degree of vertex v. The 

Cheeger ratio of a set S is defined as <P(S) = . , where we use S 

here to denote the set V\S, and E(S, S) is the set of edges with one endpoint 
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in S and the other in S. The Cheeger ratio of a graph, then, is the minumum 
Cheeger ratio over all sets in the graph, 4 5(G) = mins c y^(S). The Cheeger 
ratio provides a quantitative measure concerning graph clusters and is related 
to the expansion and spectral gap of a graph [5]. 


Local cluster and sparse cut The sparse cut problem is to approximate 
the Cheeger ratio 4>(G) of the graph. This is typically done by finding a set of 
vertices whose Cheeger ratio is close to d>(G)- that is, a set which approximates 
the sparsest cut in the graph. For the local clustering problem, however, we 
are concerned with finding a set with small Cheeger ratio within a specified 
subset of vertices. Alternatively, one can view this as a sparse cut problem on an 
induced subgraph. This Cheeger ratio is sometimes called a local Cheeger ratio 
with respect to the specified subset. 

A local clustering algorithm promises the following. Given a set S of Cheeger 
ratio </>, many vertices in S may serve as seeds for a sweep which finds a set of 
Cheeger ratio close to <j>. 


2.3 Our Results 

In this work we give a distributed algorithm which computes a local cluster of 
Cheeger ratio 0(y/cj)) with high probability, while the optimal local cluster has 

Cheeger ratio cf>. Our algorithm finishes in O ( ^gLgpr-i)" + \ log 71 ) rounds in 
the CONGEST model (Theorem [5]) where e is an error bound. Further, if <j> is 
known, we show how to compute a local cluster in O ( lo'g logfe- 1 2 ) + 7 ) rounds 
(Theorem [5|. Our algorithm is an improvement of previous local clustering al¬ 
gorithms by eliminating a log factor in the performance guarantee. Further, its 
running time improves upon algorithms using standard and PageRank random 
walks. In particular, given the Cheeger ratio of an optimal local cluster, our 
algorithm runs in time only dependent upon the approximation error, e, and 
is entirely independent of the input graph. The algorithms and accompanying 
analysis are given in Section 2) 

Similar to existing local clustering algorithms, our algorithm uses a variation 
of random walks to compute a local cluster. However, rather than a standard 
random walk m or a PageRank random walk with reset probabilities [I], we 
use the heat kernel random walk (see Section [3]). 

We remark that in the analysis of random walks, the usual notion of approx¬ 
imation is total variation distance or some other vector norm based distance. 
However, in the approximation of PageRank or heat kernel pagerank for large 
graphs, the definition of approximation is quite different. Namely, we say some 
vector p t)g is an e-approximate PHKPR vector for p ttS with a seed vertex s and 
diffusion parameter t £ R if: 

1. (1 - e)pt, s (v) -e< pt, s {v) < (1 + e)pt tS (v), and 

2. for each node v with pt,s(v) = 0, it must be that Pt,s{v) < e. 
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With the above definition of approximation, we here define the heat kernel 
pagerank approximation problem (or the PHKPR problem in short): given a 
vertex s of a graph and a diffusion parameter i £ 1 , compute values pt, s {v) for 
vertices v. We give a distributed algorithm which solves the PHKPR problem 
and finishes after only O rounds of communication (Theorem [2]). 

We extend our results to distributed fc-macliine model and show the ex¬ 
istence of an algorithm which computes a local cluster over k machines in 
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log(e 


3 k 2 log log(e -1 ) 
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+ ^ max {p-, Z\}^ rounds, where A is 


fclog log(e — !) 

the maximum degree in the graph, with high probability (Theorem [5]). We note 
that when hiding polylogarithmic factors, this time does not depend on the size 
n of the graph. We compare this to an algorithm for computing a local cluster 
with PageRank which will require O (^jp- + + 77 ) maxji, Z\}^ rounds with 


high probability, which is linear in n. These results are given in Section [5] 

We briefly note here that local clustering algorithms can easily be extended 
to sparse cut algorithms. Namely, one can sample a number of random nodes in 
the network and perform the local clustering algorithm from each. One node in 
the network can store the Cheeger ratios output by each run of the algorithm 
and simply return the minimal Cheeger ratio as the value of the sparsest cut in 
the network. In )24lll . O(Mogn) nodes are enough to compute a sparsest cut 
with high probability, where a is the size of the cut set. 


3 Fast Distributed Heat Kernel Pagerank Computation 

The idea of the algorithm is to launch a number of random walks from the seed 
node in parallel, and compute the fraction of random walks which end at a node u 
as an estimate of the PHKPR values p tjS (w). Recall the definition of personalized 
heat kernel pagerank from dH), p t)S = Y^kLo e ~ t T\ XsP k ■ Then the values of this 
vector are exactly the stationary distribution of a heat kernel random walk: with 
probability pk = e ¥- take k random walk steps according to the standard 
random walk transition probabilities P (see Section 12.211 . 

To be specific, the seed node s initializes r tokens, each of which holds a 
random variable k corresponding to the length of its random walk. Then, in 
rounds, the tokens are passed to random neighbors with a count incrementor 
until the count reaches k. At the end of the parallel random walks, each node 
holding tokens outputs the number of tokens it holds divided by r as an estimate 
for its PHKPR value. Algorithm [T] describes the full procedure. 

The algorithm is based on that given in JB] in a static setting. Theorem 1 of [B] 
states that an e-approximate PHKPR vector can be computed with the above 
procedure by setting r = if log n. Further, the approximation guarantee holds 

when limiting the maximum length of random walks to K = O ; so 

that each token is passed for max{fc, A'} rounds, where k is drawn with proba¬ 
bility pk as described above. In the static setting, this limit keeps the running 
time down. 
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Algorithm 1 DistributedEstimatePHKPR 

input: a network modeled by a graph G, a seed node s, a diffusion parameter t, an 
error bound e 

output: estimates pt.,s{v) of PHKPR values for nodes v in the network 

1: seed node s generates r = if log n tokens ti 
2: K <— c • f° r any choice of c > 1 

3: each token ti does the following: pick a value k with probability Pk = then 

hold the counter value ki «— min{fc, K} 

4: for iterations j = 1 ... K do 

5: every node v performs the following in parallel: 

6: for every token ti node v currently holds do 

7: if ki == j then 

8: hold on to this token for the duration of the iterations 

9: else 

10: send ti to a random neighbor 

11: end if 

12: end for 

13: end for 

14: let C v be the number of tokens node v currently holds 

15: each node with C v > 0 returns C v /r as an estimate for its PHKPR value pt,a(v ) 


In contrast, the distributed algorithm DistributedEstimatePHKPR takes ad¬ 
vantage of decentralized control to take multiple random walk steps via multiple 
edges at a time. That is, through parallel execution, the running time depends 
only on the length of random walks, whereas when running the random walks in 
serial, as in [B], the running time must also include the number of random walks 
performed. Thus, keeping K small is critical in keeping the number of rounds 
low, and is the key to the efficiency of our local clustering algorithms. 

The correctness of the algorithm follows directly from Theorem 1 in [6] , and 
is stated here without proof. The authors additionally give empirical evidence of 
the correctness of the algorithm with parameters r = ^| log n and K = iog°og( e -i) 
in an extended version of the paper [Jj. They specifically demonstrate that the 
ranking of nodes obtained with an e-approximate PHKPR vector computed this 
way is very close to the ranking obtained with an exact vector. 

Theorem 1. For any network G, any seed node s £ V, and any error bound 
0 < e < 1, the distributed algorithm DistributedEstimatePHKPR outputs an 
e-approximate PHKPR vector with probability at least 1 — e. 

The correctness of the algorithm holds for any choice of t, and in fact we use a 
particular value of t in our local clustering algorithm (see Section[3]). Regardless, 
it is clear that the running time is independent of any choice of t. In fact, we 
demonstrate in the proof of Theorem [5] that it is independent of n as well. 
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Theorem 2. For any network G, any seed node s £ V, and any error bound 
0 < e < 1, the distributed algorithm DistributedEstimatePHKPR finishes in 

0 ( logfogV-i) ) r0UndS - 

Proof. We show that there is no congestion in the network during any round of 
the algorithm; i.e., there are never more than O(logn) bits sent over any edge 
in any iteration of the random walk process. The proof then follows since each 
step of the random walk requires only one round of computation. 

In any run of the algorithm, If log n tokens are created, each holding a mes¬ 
sage ki corresponding to a random walk length. The token contains no other 
information. In particular, no node IDs are transmitted through the tokens. 
Therefore passing a token involves sending a message of constant size in any 
iteration of the algorithm. In the worst case, every token is transmitted through 
a single edge in a single iteration of the algorithm. However, this is still only 
O(^-logn) bits, and so meets the constraints of the model. Namely, even the 
worst case of sending every token over one edge can be done with a single round 
of communication. Therefore any random walk step requires only one round of 
communication, and by construction at most O ^ i 0 g > k)g(6- 1 ) ) ran dom walk steps 
are performed in the algorithm. □ 


4 Distributed Local Cluster Detection 

In this section we present a fast, distributed algorithm for the local clustering 
problem. The backbone of the algorithm involves investigating sets of nodes 
which accumulate in decreasing order of their pt, s (v)/d v values. The process is 
efficient and requires at most one linear scan of the nodes in the network (we 
actually show that the process can be much faster). 

We describe the algorithm presently. Let p be any function over the nodes of 
the graph, and let n be the ordering of the nodes in decreasing order of p(v)/d v . 
Then the majority of the work of the algorithm is investigating sufficiently many 
of the n— 1 cuts ( Sj,Sj ) given by the first j nodes in the ordering and the last n— j 
nodes in the ordering, respectively, for j = 1,..., n— 1. However, by “sufficiently 
many” we indicate that we may stop investigating the cut sets when either the 
volume or the size of the set Sj is large. Assume this point is after j = j. Then we 
choose the cut set that yields the minimum Cheeger ratio among the j possible 
cut sets. We call this process a sweep. As such, our local clustering algorithm is 
a sweep of a PHKPR distribution vector. 

In the static setting, this process will take O(nlogn) time in general. The 
authors in [S] give a distributed sweep algorithm that finishes in 0(n) rounds. 
We improve the analysis of j5] using a PHKPR vector. The running time of our 
sweep algorithm is given in Lemma [l] 

The sweep involves two phases. In Phase 1, the goal is for each node to know 
its place in the ordering n. Each node can compute their own pt, s (v)/d v value 
locally, and we use O(-) rounds to ensure each node knows the n values of all 
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Algorithm 2 DistributedLocalCluster 

input: a network modeled by a graph G, a seed node s, a target cluster size a, a 
target cluster volume ?, an optimal Cheeger ratio (j >, an error bound t 
output: a set of nodes S with <P(S) £ 0(y/4>) 

1: t 4 — (f) 1 log( y~ + 2ecr) 

2: compute PHKPR values pt,s(v) with DistributedEstimatePHKPR(G, s,t, e) 

3: every node v with a non-zero PHKPR. value estimate sends pt,s(v)/d v to every 
other node with a non-zero PHKPR value estimate o Phase 1 

4: let 7r be the ordering of nodes in decreasing order of pt,s{v)/d v > Phase 1 

5: compute Cheeger ratios of each of the cut sets with a call of the Distributed 
Sweep Algorithm and output the cut set of minimum Cheeger ratio t> Phase 2 


other nodes (see the proof of Lemma [[]). In Phase 2, we use the decentralized 
sweep of [8] described presently: 

Distributed Sweep Algorithm. Let N denote the number of nodes with a 
non-zero estimated PHKPR value after running the algorithm DistributedEsti- 
matePHKPR. Assume each node knows its position in ordering 7r after Phase 1. 
We will refer to nodes by their place in the ordering. Define Sj to be the cut set 
of the first j nodes in the ordering. Then computing the Cheeger ratio of each 
cut set Sj involves a computation of the volume of the set as well as \E(Sj, <Sj)|. 
Define the following: 

— LJ is the number of neighbors of node j in Sj- 1 , and 

— RJ is the number of neighbors of node j in Sj . 

Then the Cheeger ratio of each cut set can be computed locally by: 

o \E(Sj,Sj)\ = lEiSj-^Sj-^l - LJ + RJ, with \E(S 1} Sr)| = d x (2) 
o vo l(Sj) = vol(Sj-i) + LJ + Rj, with vol(Si) = d\. (3) 

We now show that a sweep can be performed in O(N) rounds. Each node 
knows the IDS of its neighbors and after Phase 1 each node knows the place of 
every other node in the ordering n. Therefore, each node can compute locally 
if a neighbor is in Sj- 1 or Sj, and so LJ and Rj can be computed locally for 
each node j. Each node can then prepare an 0(logn)-bit message of the form 
(ID,LJ,i?J). Each of the N messages of this form can then be sent to the first 
node in the ordering using the upcasting algorithm (described in the proof of 
Lemma [TJ using the 7r ordering as node rank. We note that the N nodes in the 
ordering are necessarily in a connected component of the network, and so the 
upcasting procedure can be performed in O(N) rounds. Finally, once the first 
node in the ordering is in possession of the ordering n, and the values of (LJ, RJ) 
for every node in the ordering, it may iteratively compute d>(Sj) locally using 
the rules m and ©■ Thus, this node can output the minimum Cheeger ratio 
<j>* as well as the j* such that Sj .) = tj>* after O(N) rounds. 
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Lemma 1. Performing Phases 1 and 2 of a distributed sweep takes O(-) rounds. 

Proof. First we describe how to send N 0(log n)-sized messages to a single node 
in O(N) rounds of communication. For this we can use the upcasting algorithm 
of [23] (as described in 0). We first construct a priority BFS tree of the N 
nodes with non-zero PHKPR value. We emphasize again that these nodes are 
necessarily in a connected component of the network, and it is shown in |2.'li that 
such a BFS tree can be constructed in O(N) time. Each node in the tree then 
upcasts its message to the root node through the edges of the tree. 

In Phase 1, the nodes need to be sorted according to their (non-zero) n values. 
In this case, the nodes use their pt,s(v)/d v value as their rank so that the node 
with the highest pt, s {v)/d v value is the root of the tree. Then each node upcasts 
its pt, s {v)/d v value to the root through the edges of the tree. The root node 
locally sorts these values and then floods all the 7r values to the nodes through 
tree edges. The upcast and flooding process take O(N) rounds to reach each of 
the nodes in the tree. 

Phase 2 consists of the Distributed Sweep Algorithm, where the first 
node in the ordering computes the Cheeger ratio of Sj for each node j in the 
ordering. In order to send each of the (ID, LJ, f?J) messages to the first node of 
the ordering we again upcast through the edges of a priority BFS tree, however 
in this round we use tt values as node rank. The root node is then able to locally 
compute Cheeger ratios and output the cutset of minimum Cheeger ratio after 
O(N) rounds for upcasting. 

Thus Phase 1 requires O(N) rounds for upcasting and flooding values. Phase 
2 requires O(N) rounds for upcasting values necessary for locally computing 
Cheeger ratios. Since we compute an e-approximate PHKPR vector as our dis¬ 
tribution, we know that N is no more than 0{~). This is because we assume 
Yhvev Pt,s( v ) = 1) an d so n0 more than j vertices can have values at least e. 
Thus the full sweep takes 0{\) rounds. □ 

We note here that the time required for the sweep may be reduced if there are 
size or volume restraints for the cut set. In this case, an alternative distributed 
sweep algorithm may be utilized. As usual, we refer to each node by their place 
in the ordering tt. Node 1 begins the sweep by sending vol(Si), \E(Si, Si)| to 
node 2. Then nodes j = 2 ,...,N iteratively compute using the val¬ 
ues of vol(Sj_i), \E(Sj-\, Lj and RJ, and then subsequently sending 

vol (Sj),\E(Sj,Sj)\ to the next node j + 1 in the ordering. Additionally, each 
node can send the minimum Cheeger ratio cf>* computed thus far as well as the 
j* such that <&( Sj *) = </>*. Thus the last node in the ordering can output Sj*. In 
this algorithm, each iteration j will require d rounds of communication, were d 
is the shortest path distance between nodes j — 1 and j. However, no two nodes 
will ever be at a distance of greater than 0( lo 1 ° 1 ^( £ _\^ ) steps by construction. In 

this way, the first j Cheeger ratios can be computed in CKjX ^giogp- 1 ) ) rounds. 

If size or volume restraints are placed on the cluster, we may stop the sweep 
at node j when the size or volume of Sj is greater than some specified value. We 
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output the set Sj* for that iteration, and this process requires OQ( i 0 g^,g( e -i) )) 
rounds. 


The algorithm DistributedLocalCluster (Algorithm [3]) is a complete descrip¬ 
tion of our distributed local clustering algorithm. The correctness of the algo¬ 
rithm follows directly from [5] and we omit the proof here. 

Theorem 3. For any network G, suppose there is a set of Cheeger ratio <f>. Then 
at least half of the vertices in S can serve as the seed s so that for any error 
bound 0 < e < 1, the algorithm DistributedLocalCluster will find a set of Cheeger 
ratio 0{y/4>) with probability at least 1 — e. 

Theorem 4. For any network G, any seed node s £ V, and any error bound 
0 < e < 1, the algorithm DistributedLocalCluster finishes in O ( ip'giog^-i) + 
rounds. 


Proof. The only distributed computations are those for computing approximate 
PHKPR values (line [2]) and Phase 1 (lines [5] and [p and Phase 2 (line [5]) of the 

distributed sweep. Computing PHKPR values takes O rounds by 

Theorem [3 and Phases 1 and 2 together take 0{\) rounds by Lemma [1] Thus 
the running time follows. □ 

One possible concern with the algorithm DistributedLocalCluster is that one 
cannot guarantee knowing the value of f> ahead of time for any particular node 
s. Therefore a true local clustering algorithm should be able to proceed without 
this information. This can be achieved by “testing” a few values of <j> (and fixing 
some reasonable values for a and c). Namely, begin with <j> = 1/2 and run the 
algorithm above. If the output cut set S satisfies <L(S) £ 0{s/4>) 1 we are done. If 
not, halve the value of cf> and continue. There are O(logn) such guesses, and we 
have arrived at the following. 


Theorem 5. For any network G, any node s, and any error bound 0 < e < 1, 
there is a distributed algorithm that computes a set S with Cheeger ratio within 
a quadratic of the optimal which finishes in O ^ ^ogtogfe-i)” “t ~ log 71 ) rounds. 


6 


In particular, when ignoring polylogarithmic factors, the running time is 
logle" 1 ) , lA 

loglog(e —x ) ' t) ’ 


5 Computing Local Clusters in the fc-Machine Model 

In this section we consider a graph on n vertices which is distributed across 
k nodes in a computer network. This is the fc-machine model introduced in 
Section m 

In the fc-machine model, we consider a network of k > 1 distinct machines 
that are pairwise interconnected by bidirectional point-to-point communication 
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links. Each machine executes an instance of a distributed algorithm. The compu¬ 
tation advances in rounds where, in each round, machines can exchange messages 
through their communication links. We again assume that each link has a band¬ 
width of 0(log?r) meaning that O(logn) bits may be transmitted through a link 
in any round. We also assume no shared memory and no other means of com¬ 
munication between nodes. When we say an algorithm solves a problem in x 
rounds, we mean that x is the maximum number of rounds until termination of 
the algorithm, over all n-node, ?n-edge graphs G. 

In this model we are solving massive graph problems in which the vertices 
of the graph are distributed among the k machines. We assume n > k (typically 
n k). Initially the entire graph is not known by a single machine but rather 
partitioned among the k machines in a “balanced” fashion so that the nodes 
and/or edges are partitioned approximately evenly among the machines. There 
are several ways of partitioning vertices, and we will consider a random partition, 
where vertices and incident edges are randomly assigned to machines. Formally, 
each vertex v of G is assigned independently and randomly to one of the k 
machines, which we call the home machine of v. The home machine of v thereafter 
knows the ID of v as well as the IDs and home machines of neighbors of v. 

In the remainder of this section we prove the existence of efficient algorithms 
for solving the PHKPR and local cluster problems in the /c-machine model. Our 
main tool is the Conversion Theorem of m- 

Define M as the message complexity , the worst case number of messages sent 
in total during a run of the algorithm. Also define C as the communication degree 
complexity , or the maximum number of messages sent or received by any node in 
any round of the algorithm. Then we use as a key tool the Conversion Theorem 
as restated below. 

Theorem 6 (Conversion Theorem [13]). Suppose there is an algorithm Ac 
that solves problem P in the CONGEST model for any n-node graph G with 
probability at least 1 — e in time Tc(n). Further, let Ac use message complexity 
M and communication degree complexity C. Then there exists an algorithm A 
that solves P for any n-node graph G with probability at least 1 — e in the k- 
machine model in O + Tc ^ C ^ rounds with high probability. 

In the forthcoming theorems, by “high probability” we mean with probability 
at least 1 — 1/n. 

We note that the proof of the Conversion Theorem is constructive, describing 
precisely how an algorithm Ak in the £;-machine model simulates the algorithm 
Ac in the CONGEST model. We omit the simulation here but encourage the 
reader to refer to the proof for implementation details. 

By Theorem [2j we know that PHKPR values can be estimated with e- 
accuracy in O rounds. A total of O(^-logn) messages are gen¬ 

erated and propogated for at most O ( i 0 gtog(e-i) ) random walk steps, for a 
total of O f *°fogiog(°-i) ) messages sent during a run of the algorithm. In the 
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first random walk step, each of the O (^ log«) messages may be passed to a 
neighbor of the seed node, so the message complexity is O logn). Therefore 
we arrive at the following. 

Theorem 7. There exists an algorithm that solves the PHKPR problem for any 
n-node graph in the k-machine model with probability at least 1 — e and runs in 
6 (i + 1 )) rounds with high probability. 

By Theorem 0 a local cluster about any seed node can be computed in 
O ( + 7 log rounds. The message complexity for the PHKPR phase 

is O ^ *°fog iog(°-”) ) l°g n ) anc l f° r the swee P phase is O (Mogn), for a total 

message complexity of O ( ffiogiog^-i)' + \ logn) ■ The communication degree 

complexity is O (pTog?r) for the PHKPR phase (as above), and 0{A ), where 
A is the maximum degree in the graph, for the sweep phase. Thus the commu¬ 
nication degree complexity for the algorithm is the maximum of these two. We 
therefore have the following result for the /c-machine model. 


Theorem 8. There exists an algorithm that computes a local cluster for any 
n-node graph in the k-machine model with probability at least 1 — e and runs in 


Iog(e ) 


() I jogje P | 1 _ 

e 3 k 2 log log(e -1 ) ' eh 2 "T" ^ k log log(e -1 ) 

the maximum degree in the graph, with high probability. 


log(e ) 


7 - max 




rounds, where A is 
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